Biblioteca Digital

14 resultados para CHD Prediction, Blood Serum Data Chemometrics Methods

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna

Kernel Methods for Tree Structured Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Veja mais

Analytical challenges in the fields of Nanobioscience and Nanobiotecnology: integrated methods for separation and characterization

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nano(bio)science and nano(bio)technology play a growing and tremendous interest both on academic and industrial aspects. They are undergoing rapid developments on many fronts such as genomics, proteomics, system biology, and medical applications. However, the lack of characterization tools for nano(bio)systems is currently considered as a major limiting factor to the final establishment of nano(bio)technologies. Flow Field-Flow Fractionation (FlFFF) is a separation technique that is definitely emerging in the bioanalytical field, and the number of applications on nano(bio)analytes such as high molar-mass proteins and protein complexes, sub-cellular units, viruses, and functionalized nanoparticles is constantly increasing. This can be ascribed to the intrinsic advantages of FlFFF for the separation of nano(bio)analytes. FlFFF is ideally suited to separate particles over a broad size range (1 nm-1 μm) according to their hydrodynamic radius (rh). The fractionation is carried out in an empty channel by a flow stream of a mobile phase of any composition. For these reasons, fractionation is developed without surface interaction of the analyte with packing or gel media, and there is no stationary phase able to induce mechanical or shear stress on nanosized analytes, which are for these reasons kept in their native state. Characterization of nano(bio)analytes is made possible after fractionation by interfacing the FlFFF system with detection techniques for morphological, optical or mass characterization. For instance, FlFFF coupling with multi-angle light scattering (MALS) detection allows for absolute molecular weight and size determination, and mass spectrometry has made FlFFF enter the field of proteomics. Potentialities of FlFFF couplings with multi-detection systems are discussed in the first section of this dissertation. The second and the third sections are dedicated to new methods that have been developed for the analysis and characterization of different samples of interest in the fields of diagnostics, pharmaceutics, and nanomedicine. The second section focuses on biological samples such as protein complexes and protein aggregates. In particular it focuses on FlFFF methods developed to give new insights into: a) chemical composition and morphological features of blood serum lipoprotein classes, b) time-dependent aggregation pattern of the amyloid protein Aβ1-42, and c) aggregation state of antibody therapeutics in their formulation buffers. The third section is dedicated to the analysis and characterization of structured nanoparticles designed for nanomedicine applications. The discussed results indicate that FlFFF with on-line MALS and fluorescence detection (FD) may become the unparallel methodology for the analysis and characterization of new, structured, fluorescent nanomaterials.

Veja mais

Development of Original Analytical Methods for the Determination of Drugs of Abuse in Biological Samples

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Drug abuse is a major global problem which has a strong impact not only on the single individual but also on the entire society. Among the different strategies that can be used to address this issue an important role is played by identification of abusers and proper medical treatment. This kind of therapy should be carefully monitored in order to discourage improper use of the medication and to tailor the dose according to the specific needs of the patient. Hence, reliable analytical methods are needed to reveal drug intake and to support physicians in the pharmacological management of drug dependence. In the present Ph.D. thesis original analytical methods for the determination of drugs with a potential for abuse and of substances used in the pharmacological treatment of drug addiction are presented. In particular, the work has been focused on the analysis of ketamine, naloxone and long-acting opioids (buprenorphine and methadone), oxycodone, disulfiram and bupropion in human plasma and in dried blood spots. The developed methods are based on the use of high performance liquid chromatography (HPLC) coupled to various kinds of detectors (mass spectrometer, coulometric detector, diode array detector). For biological sample pre-treatment different techniques have been exploited, namely solid phase extraction and microextraction by packed sorbent. All the presented methods have been validated according to official guidelines with good results and some of these have been successfully applied to the therapeutic drug monitoring of patients under treatment for drug abuse.

Veja mais

Percezione della direzione del proprio movimento: dalla registrazione dell'attività corticale al modello computazionale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main aim of this thesis is strongly interdisciplinary: it involves and presumes a knowledge on Neurophysiology, to understand the mechanisms that undergo the studied phenomena, a knowledge and experience on Electronics, necessary during the hardware experimental set-up to acquire neuronal data, on Informatics and programming to write the code necessary to control the behaviours of the subjects during experiments and the visual presentation of stimuli. At last, neuronal and statistical models should be well known to help in interpreting data. The project started with an accurate bibliographic research: until now the mechanism of perception of heading (or direction of motion) are still poorly known. The main interest is to understand how the integration of visual information relative to our motion with eye position information happens. To investigate the cortical response to visual stimuli in motion and the integration with eye position, we decided to study an animal model, using Optic Flow expansion and contraction as visual stimuli. In the first chapter of the thesis, the basic aims of the research project are presented, together with the reasons why it’s interesting and important to study perception of motion. Moreover, this chapter describes the methods my research group thought to be more adequate to contribute to scientific community and underlines my personal contribute to the project. The second chapter presents an overview on useful knowledge to follow the main part of the thesis: it starts with a brief introduction on central nervous system, on cortical functions, then it presents more deeply associations areas, which are the main target of our study. Furthermore, it tries to explain why studies on animal models are necessary to understand mechanism at a cellular level, that could not be addressed on any other way. In the second part of the chapter, basics on electrophysiology and cellular communication are presented, together with traditional neuronal data analysis methods. The third chapter is intended to be a helpful resource for future works in the laboratory: it presents the hardware used for experimental sessions, how to control animal behaviour during the experiments by means of C routines and a software, and how to present visual stimuli on a screen. The forth chapter is the main core of the research project and the thesis. In the methods, experimental paradigms, visual stimuli and data analysis are presented. In the results, cellular response of area PEc to visual stimuli in motion combined with different eye positions are shown. In brief, this study led to the identification of different cellular behaviour in relation to focus of expansion (the direction of motion given by the optic flow pattern) and eye position. The originality and importance of the results are pointed out in the conclusions: this is the first study aimed to investigate perception of motion in this particular cortical area. In the last paragraph, a neuronal network model is presented: the aim is simulating cellular pre-saccadic and post-saccadic response of neuron in area PEc, during eye movement tasks. The same data presented in chapter four, are further analysed in chapter fifth. The analysis started from the observation of the neuronal responses during 1s time period in which the visual stimulation was the same. It was clear that cells activities showed oscillations in time, that had been neglected by the previous analysis based on mean firing frequency. Results distinguished two cellular behaviour by their response characteristics: some neurons showed oscillations that changed depending on eye and optic flow position, while others kept the same oscillations characteristics independent of the stimulus. The last chapter discusses the results of the research project, comments the originality and interdisciplinary of the study and proposes some future developments.

Veja mais

Measurement of the differential cross section of tt pairs in pp collision at sqrt(s) = 7TeV with the ATLAS detector at the LHC

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis three measurements of top-antitop differential cross section at an energy in the center of mass of 7 TeV will be shown, as a function of the transverse momentum, the mass and the rapidity of the top-antitop system. The analysis has been carried over a data sample of about 5/fb recorded with the ATLAS detector. The events have been selected with a cut based approach in the "one lepton plus jets" channel, where the lepton can be either an electron or a muon. The most relevant backgrounds (multi-jet QCD and W+jets) have been extracted using data driven methods; the others (Z+ jets, diboson and single top) have been simulated with Monte Carlo techniques. The final, background-subtracted, distributions have been corrected, using unfolding methods, for the detector and selection effects. At the end, the results have been compared with the theoretical predictions. The measurements are dominated by the systematic uncertainties and show no relevant deviation from the Standard Model predictions.

Veja mais

Gait analysis using a single wearable inertial measurement unit

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Procedures for quantitative walking analysis include the assessment of body segment movements within defined gait cycles. Recently, methods to track human body motion using inertial measurement units have been suggested. It is not known if these techniques can be readily transferred to clinical measurement situations. This work investigates the aspects necessary for one inertial measurement unit mounted on the lower back to track orientation, and determine spatio-temporal features of gait outside the confines of a conventional gait laboratory. Apparent limitations of different inertial sensors can be overcome by fusing data using methods such as a Kalman filter. The benefits of optimizing such a filter for the type of motion are unknown. 3D accelerations and 3D angular velocities were collected for 18 healthy subjects while treadmill walking. Optimization of Kalman filter parameters improved pitch and roll angle estimates when compared to angles derived using stereophotogrammetry. A Weighted Fourier Linear Combiner method for estimating 3D orientation angles by constructing an analytical representation of angular velocities and allowing drift free integration is also presented. When tested this method provided accurate estimates of 3D orientation when compared to stereophotogrammetry. Methods to determine spatio-temporal features from lower trunk accelerations generally require knowledge of sensor alignment. A method was developed to estimate the instants of initial and final ground contact from accelerations measured by a waist mounted inertial device without rigorous alignment. A continuous wavelet transform method was used to filter and differentiate the signal and derive estimates of initial and final contact times. The technique was tested with data recorded for both healthy and pathologic (hemiplegia and Parkinson’s disease) subjects and validated using an instrumented mat. The results show that a single inertial measurement unit can assist whole body gait assessment however further investigation is required to understand altered gait timing in some pathological subjects.

Veja mais

Composti perfluorurati: Valutazione degli effetti biologici e molecolari in modelli cellulari

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’acido perfluorottanoico (PFOA) e l’acido perfluoronanoico (PFNA) sono composti perfluorurati (PFCs) comunemente utilizzati nell’industria, negli ultimi 60 anni, per diverse applicazioni. A causa della loro resistenza alla degradazione, questi composti sono in grado di accumularsi nell’ambiente e negli organismi viventi, da cui possono essere assunti in particolare attraverso la dieta. Le esistenti evidenze sugli effetti dell’esposizione negli animali, tra cui la potenziale cancerogenicità, hanno accresciuto l’interesse sui possibili rischi per la salute nell’uomo. Recenti studi sull’uomo indicano che i PFC sono presenti nel siero, con livelli molto alti soprattutto nei lavoratori cronicamente esposti, e sono associati positivamente al cancro al seno e alla prostata. Inoltre, sono state riportate proprietà estrogen-like e variazioni nei livelli di metilazione sui promotori di alcuni geni. L’esposizione in utero è stata associata positivamente a ipometilazione globale del DNA nel siero cordonale. L’obiettivo di questo studio è stato quello di indagare gli effetti dell’esposizione a questi perfluorurati su linee cellulari tumorali e primarie umane (MOLM-13, RPMI, HEPG2, MCF7,WBC, HMEC e MCF12A), appartenenti a diversi tessuti target, utilizzando un ampio range di concentrazioni (3.12 nM - 500 μM). In particolare, si è valutato: la vitalità, il ciclo cellulare, l’espressione genica, la metilazione globale del DNA e la metilazione gene specifica. Dai risultati è emerso come entrambi i perfluorurati abbiano effetti biologici: PFOA presenta un effetto prevalente citostatico, PFNA prevalentemente citotossico. L’effetto è, però, prevalente sulle linee cellulari primarie di epitelio mammario (HMEC, MCF12A), anche a concentrazioni riscontrate in lavoratori cronicamente esposti (≥31,25 µM). Dall’analisi su queste cellule primarie, non risultano variazioni significative della metilazione globale del DNA alle concentrazioni di 15,6 e 31,25 µM. Emergono invece variazioni sui geni marcatori del cancro al seno, del ciclo cellulare, dell’apoptosi, del pathway di PPAR-α e degli estrogeni, ad una concentrazione di 31,25 µM di entrambi i PFCs.

Veja mais

Temperature Variation Aware Energy Optimization in Heterogeneous MPSoCs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thermal effects are rapidly gaining importance in nanometer heterogeneous integrated systems. Increased power density, coupled with spatio-temporal variability of chip workload, cause lateral and vertical temperature non-uniformities (variations) in the chip structure. The assumption of an uniform temperature for a large circuit leads to inaccurate determination of key design parameters. To improve design quality, we need precise estimation of temperature at detailed spatial resolution which is very computationally intensive. Consequently, thermal analysis of the designs needs to be done at multiple levels of granularity. To further investigate the flow of chip/package thermal analysis we exploit the Intel Single Chip Cloud Computer (SCC) and propose a methodology for calibration of SCC on-die temperature sensors. We also develop an infrastructure for online monitoring of SCC temperature sensor readings and SCC power consumption. Having the thermal simulation tool in hand, we propose MiMAPT, an approach for analyzing delay, power and temperature in digital integrated circuits. MiMAPT integrates seamlessly into industrial Front-end and Back-end chip design flows. It accounts for temperature non-uniformities and self-heating while performing analysis. Furthermore, we extend the temperature variation aware analysis of designs to 3D MPSoCs with Wide-I/O DRAM. We improve the DRAM refresh power by considering the lateral and vertical temperature variations in the 3D structure and adapting the per-DRAM-bank refresh period accordingly. We develop an advanced virtual platform which models the performance, power, and thermal behavior of a 3D-integrated MPSoC with Wide-I/O DRAMs in detail. Moving towards real-world multi-core heterogeneous SoC designs, a reconfigurable heterogeneous platform (ZYNQ) is exploited to further study the performance and energy efficiency of various CPU-accelerator data sharing methods in heterogeneous hardware architectures. A complete hardware accelerator featuring clusters of OpenRISC CPUs, with dynamic address remapping capability is built and verified on a real hardware.

Veja mais

Flow Field–Flow Fractionation for size analysis and characterization of nanoparticles for applications in Life Sciences

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nanotechnologies are rapidly expanding because of the opportunities that the new materials offer in many areas such as the manufacturing industry, food production, processing and preservation, and in the pharmaceutical and cosmetic industry. Size distribution of the nanoparticles determines their properties and is a fundamental parameter that needs to be monitored from the small-scale synthesis up to the bulk production and quality control of nanotech products on the market. A consequence of the increasing number of applications of nanomaterial is that the EU regulatory authorities are introducing the obligation for companies that make use of nanomaterials to acquire analytical platforms for the assessment of the size parameters of the nanomaterials. In this work, Asymmetrical Flow Field-Flow Fractionation (AF4) and Hollow Fiber F4 (HF5), hyphenated with Multiangle Light Scattering (MALS) are presented as tools for a deep functional characterization of nanoparticles. In particular, it is demonstrated the applicability of AF4-MALS for the characterization of liposomes in a wide series of mediums. Afterwards the technique is used to explore the functional features of a liposomal drug vector in terms of its biological and physical interaction with blood serum components: a comprehensive approach to understand the behavior of lipid vesicles in terms of drug release and fusion/interaction with other biological species is described, together with weaknesses and strength of the method. Afterwards the size characterization, size stability, and conjugation of azidothymidine drug molecules with a new generation of metastable drug vectors, the Metal Organic Frameworks, is discussed. Lastly, it is shown the applicability of HF5-ICP-MS for the rapid screening of samples of relevant nanorisk: rather than a deep and comprehensive characterization it this time shown a quick and smart methodology that within few steps provides qualitative information on the content of metallic nanoparticles in tattoo ink samples.

Veja mais

Indagini forensi in tema di scambio di file pedopornografici mediante software di file sharing a mezzo peer-to-peer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La prova informatica richiede l’adozione di precauzioni come in un qualsiasi altro accertamento scientifico. Si fornisce una panoramica sugli aspetti metodologici e applicativi dell’informatica forense alla luce del recente standard ISO/IEC 27037:2012 in tema di trattamento del reperto informatico nelle fasi di identificazione, raccolta, acquisizione e conservazione del dato digitale. Tali metodologie si attengono scrupolosamente alle esigenze di integrità e autenticità richieste dalle norme in materia di informatica forense, in particolare della Legge 48/2008 di ratifica della Convenzione di Budapest sul Cybercrime. In merito al reato di pedopornografia si offre una rassegna della normativa comunitaria e nazionale, ponendo l’enfasi sugli aspetti rilevanti ai fini dell’analisi forense. Rilevato che il file sharing su reti peer-to-peer è il canale sul quale maggiormente si concentra lo scambio di materiale illecito, si fornisce una panoramica dei protocolli e dei sistemi maggiormente diffusi, ponendo enfasi sulla rete eDonkey e il software eMule che trovano ampia diffusione tra gli utenti italiani. Si accenna alle problematiche che si incontrano nelle attività di indagine e di repressione del fenomeno, di competenza delle forze di polizia, per poi concentrarsi e fornire il contributo rilevante in tema di analisi forensi di sistemi informatici sequestrati a soggetti indagati (o imputati) di reato di pedopornografia: la progettazione e l’implementazione di eMuleForensic consente di svolgere in maniera estremamente precisa e rapida le operazioni di analisi degli eventi che si verificano utilizzando il software di file sharing eMule; il software è disponibile sia in rete all’url http://www.emuleforensic.com, sia come tool all’interno della distribuzione forense DEFT. Infine si fornisce una proposta di protocollo operativo per l’analisi forense di sistemi informatici coinvolti in indagini forensi di pedopornografia.

Veja mais

Spatial and temporal characterisation of ground deformation recorded by geodetic techniques

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A critical point in the analysis of ground displacements time series is the development of data driven methods that allow the different sources that generate the observed displacements to be discerned and characterised. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows reducing the dimensionality of the data space maintaining most of the variance of the dataset explained. Anyway, PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem, i.e. in recovering and separating the original sources that generated the observed data. This is mainly due to the assumptions on which PCA relies: it looks for a new Euclidean space where the projected data are uncorrelated. The Independent Component Analysis (ICA) is a popular technique adopted to approach this problem. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, I use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources, giving a more reliable estimate of them. Here I present the application of the vbICA technique to GPS position time series. First, I use vbICA on synthetic data that simulate a seismic cycle (interseismic + coseismic + postseismic + seasonal + noise) and a volcanic source, and I study the ability of the algorithm to recover the original (known) sources of deformation. Secondly, I apply vbICA to different tectonically active scenarios, such as the 2009 L'Aquila (central Italy) earthquake, the 2012 Emilia (northern Italy) seismic sequence, and the 2006 Guerrero (Mexico) Slow Slip Event (SSE).

Veja mais

Development of original analytical methods for the therapeutic drug monitoring of CNS druges: Antipsychotics, Antidepressants and Anxiolytics-hypnotics

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Great strides have been made in the last few years in the pharmacological treatment of neuropsychiatric disorders, with the introduction into the therapy of several new and more efficient agents, which have improved the quality of life of many patients. Despite these advances, a large percentage of patients is still considered “non-responder” to the therapy, not drawing any benefits from it. Moreover, these patients have a peculiar therapeutic profile, due to the very frequent application of polypharmacy, attempting to obtain satisfactory remission of the multiple aspects of psychiatric syndromes. Therapy is heavily individualised and switching from one therapeutic agent to another is quite frequent. One of the main problems of this situation is the possibility of unwanted or unexpected pharmacological interactions, which can occur both during polypharmacy and during switching. Simultaneous administration of psychiatric drugs can easily lead to interactions if one of the administered compounds influences the metabolism of the others. Impaired CYP450 function due to inhibition of the enzyme is frequent. Other metabolic pathways, such as glucuronidation, can also be influenced. The Therapeutic Drug Monitoring (TDM) of psychotropic drugs is an important tool for treatment personalisation and optimisation. It deals with the determination of parent drugs and metabolites plasma levels, in order to monitor them over time and to compare these findings with clinical data. This allows establishing chemical-clinical correlations (such as those between administered dose and therapeutic and side effects), which are essential to obtain the maximum therapeutic efficacy, while minimising side and toxic effects. It is evident the importance of developing sensitive and selective analytical methods for the determination of the administered drugs and their main metabolites, in order to obtain reliable data that can correctly support clinical decisions. During the three years of Ph.D. program, some analytical methods based on HPLC have been developed, validated and successfully applied to the TDM of psychiatric patients undergoing treatment with drugs belonging to following classes: antipsychotics, antidepressants and anxiolytic-hypnotics. The biological matrices which have been processed were: blood, plasma, serum, saliva, urine, hair and rat brain. Among antipsychotics, both atypical and classical agents have been considered, such as haloperidol, chlorpromazine, clotiapine, loxapine, risperidone (and 9-hydroxyrisperidone), clozapine (as well as N-desmethylclozapine and clozapine N-oxide) and quetiapine. While the need for an accurate TDM of schizophrenic patients is being increasingly recognized by psychiatrists, only in the last few years the same attention is being paid to the TDM of depressed patients. This is leading to the acknowledgment that depression pharmacotherapy can greatly benefit from the accurate application of TDM. For this reason, the research activity has also been focused on first and second-generation antidepressant agents, like triciclic antidepressants, trazodone and m-chlorophenylpiperazine (m-cpp), paroxetine and its three main metabolites, venlafaxine and its active metabolite, and the most recent antidepressant introduced into the market, duloxetine. Among anxiolytics-hypnotics, benzodiazepines are very often involved in the pharmacotherapy of depression for the relief of anxious components; for this reason, it is useful to monitor these drugs, especially in cases of polypharmacy. The results obtained during these three years of Ph.D. program are reliable and the developed HPLC methods are suitable for the qualitative and quantitative determination of CNS drugs in biological fluids for TDM purposes.

Veja mais

Machine-learning methods for structure prediction of β-barrel membrane proteins

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Veja mais

Learning with Kernels on Graphs: DAG-based kernels, data streams and RNA function prediction.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Veja mais

14 resultados para CHD Prediction, Blood Serum Data Chemometrics Methods

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna

Filtro por publicador